0%

(2017) Multi-scale dense convolutional networks for efficient prediction

Keyword [Multi-Scale DenseNet]

Huang G, Chen D, Li T, et al. Multi-scale dense convolutional networks for efficient prediction[J]. arXiv preprint arXiv:1703.09844, 2017, 2.



1. Overview


1.1. Motivation

  • Small model can deal with easy example, but make mistake for hard
  • Hard example need to be deal with by large model, but waste for easy


  • last layer’s features for classification, early layer’s are not
  • first layer for fine-scale, later layer for coarse-scale

In this paper, Multi-Scale DenseNet (MSDNet) is proposed which automatically

  • small for easy
  • large for hard
    contains two setting
  • anytime classification
  • budgeted batch classification (probability threshhold)

has two feature

  • multi-scale feature map + multi-classifier
  • dense connectivity

1.2. Contribution

  • First deep learning architecture of its kind that allows dynamic resource adaptation with a single model
  • First discover that dense connectivity is crucial to early-exit classifier

1.3. Setting

1.3.1. Anytime prediction

  • stop at any time point (budget exhausted)
  • return most recent predication
  • nondeterministic budget, varies per test instance


  1. L. suitable loss function
  2. B. budget
  3. f. model
  4. x. input image

1.3.2. Budget batch classification

  • stop when sufficient confidence
  • Less than B/M computation for easy example
  • More than B/M computation for hard example


  • Computation-efficient
    • prune weights
    • quantize weights
    • compact model
    • knowledge-distillation
  • Resource-efficient
    • FractalNet
    • Adaptive computation time approach

1.5. Visualization



1.6. Future Work

  • Extend to other task. segmentation
  • Combine MSDNet with model compression, spatially adaptive computation, more efficient convolution operation



2. Multi-Scale DenseNet


2.1. Problem



  • Lack of coarse-level feature

    • Accuracy of the classifier is correlated with its position within the network
    • Solution. multi-scale feature map
  • Early classifier interfere with later classifier

    • Solution. Dense connectivity

2.2. Model




2.3. Lazy Evaluation

Finer feature map do not influence the prediction of classifier.

2.4. Loss Function



Empirically, $w_k$ = 1.